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Abstract 

Establishing the convergence of splines can be cast as a variational problem which is amenable 
to a r-convergence approach. We consider the case in which the regularization coefficient scales 
with the number of observations, n, as A„ = n~ p . Using standard theorems from the I’-convergence 
literature, we prove that the general spline model is consistent in that estimators converge in a sense 
slightly weaker than weak convergence in probability for p Without further assumptions we 

show this rate is sharp. This differs from rates for strong convergence using Hilbert scales where 
one can often choose p > \. 

Keywords: Variational methods, I'-convergcnce, pointwise convergence, general spline model, 
nonparametric smoothing. 


1 Introduction 

Given a Hilbert space, 7 l, with dual Li*, the general spline problem [22], [44] is to recover pi € 7 l from 
observations, {(Lj,yj)}™ =1 C 7 l* x M, and the model 

yi = Lip) + a, ( 1 ) 

where and Lj are independent random variables taking values in M and Li*, respectively. We assume 
that Li can be decomposed into Li = Lio © Li\ where, for I = 0. 1, ('Hi, || • ||/) are themselves both 
Hilbert spaces. For example, one may apply the theory to the special spline problem (also referred 
to as smoothing splines) where Li = H m ([0, 1]) (m > 1) is the Sobolev space of degree m and the 
observation operators are of the form L,// = /./(7,) in which ti is sampled from some distribution over 
[0,1]. Throughout this paper we refer to (1) as the general spline model when L, € Li* and Li is any 
Hilbert space, and the special spline model when Lj is the pointwise evaluation operator and Li = H m . 

Establishing convergence and the rate of convergence of estimates //' of y) remains a current area of 
research [3,4,9,17,20,24,27,47]. These results establish strong convergence, in the sense of convergence 
with respect to a norm, and related rates of the special spline problem. Convergence with respect to the 
norm in the original space is typically not achievable so convergence results are in weaker topologies 
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(equivalently larger spaces). This paper tills a gap in the literature by establishing the convergence of the 
general spline problem in the original space in the sense that VF e F*, F(//'j converges in probability 
to F(/F). There exist results for pointwise convergence of the special spline problem with equally 
spaced (tj = data points [26,33,48,50,51]. Our results do not assume data points are equally spaced 
(we do however require that they are iid) and we consider the general case where L, are bounded and 
linear operators (not necessarily pointwise evaluation). 

We assume that dim(Fo) = m < oo and dim("Hi) = oo. This can be seen as a multi-scale 
decomposition of F. The projection of a function // G F into the subspace Fo is a coarse approximation 
of that function. Continuing with the special spline example, one can write 


m— 1 


Kt) = Y 


v>( o) 


t l + 


(t-u) 


m— 1 


i =0 


(m — 1)! 


-V m /z(u) du 


for any // € //'". The space Fo is then the space of polynomials of degree at most m — 1. Hence 
dim(Fo) = m. Imposing a penalty on the F \ space, we construct a sequence of estimators /t" of /d as 
the minimizers of 

1 n 

fnilu) = 'y ' |Hi ~ Lii/i I" + A n ||xi/r||i 

i =1 

where Xi '■ % 'Hi (* = 0,1) is the projection of F onto F,. This paper addresses the asymptotic 
behaviour (as n -» oo) of the general spline problem and in particular how one should choose A n to 
ensure //" converges (in the weak sense that VF £ F*, F(/x n ) converges in probability to F(/F)) to /v. 
An alternative, but closely related, method is the penalized spline problem, for example [14], where the 
estimate /J is found by minimizing f n over functions of the form // = Yli =l F: F, where B, are a set of 
/i-splines and penalising the coefficients a, or derivatives of //. Typically l <C n so the complexity of 
the problem decreases. 

There are two bodies of literature on the specification of A n . On the one hand there are methods 
which define A n as the minimizer of some loss function, for example average square error. This class 
of techniques includes cross-validation [45], generalized cross-validation [12] and penalized likelihood 
techniques [18,19,23,29,32,43]. These methods provide a numerical value of A n for a given n and a 
given set of data. In the case of special splines there are many results on the asymptotic behavior of A n 
and /i n for these methods, see for example [1,10,12,25,36,40,41,46]. The alternative approach, and 
the one we take in this paper, is to choose a sequence such that the estimates //' converge to /J in an 
appropriate sense at the fastest possible rate. This strategy gives a scaling regime for A„, but it does not 
in general give specific numerical values of A n , i.e. it provides the optimal rate of convergence but not 
the associated multiplicative constant. 

When considering strong convergence many results in the literature demonstrate /j n —» /v in a norm 
via the use of Hilbert scales — see, for example, [11,30,31,35, 37,42]. It is not typically possible to 
obtain strong convergence with respect to the original norm and it is common to resort to the use of 
weaker norms; for example, in the special spline problem, one starts with the space H s but looks for 
convergence in L 2 . The alternative, which is pursued in this paper, is to consider modes of convergence 
related to weak convergence in the original space, F. 

Note that for special splines strong convergence in a larger space is a weaker result than weak 
convergence in the original space: by the Sobolev embedding theorem, weak convergence in H s implies 
strong convergence in L 2 ; however, the converse does not hold. 

In this paper we show that the estimators of the general spline problem converge in a sense slightly 
weaker than convergence weakly in probability in the large data limit, fj, n for regularization A n 

_ i 

that scales to zero no faster than n 2 . In this scaling regime we say that the general spline problem 
is consistent. For insufficient regularization the spline estimators may in some sense ‘blow up’. In 
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particular for scaling outside this regime we construct (uniformly bounded) observation operators Li 
such that E [||/U n || 2 ] —>• oo. Hence without further assumptions our results are sharp. 

We note that these results have practical implications. If we are interested in estimating /v at a point 
t then we let Fiji) = /i(t) where F £ 'H*. In this setting weak convergence, or the pointwise form 
considered in this paper, are the natural modes of convergence to consider. Whereas, if one is interested 
in a global approximation of //) , then convergence of //' — /v in an appropriate norm is the more relevant. 
The two formulations imply different scaling results for X n . 

There are many results in the ill-posed inverse problems literature that may be applied to the strong 
convergence of the general spline problem, for brevity we only mention those most relevant to this work. 
In [43] two different methods of estimating X n were compared as n -> oc using the general spline 
formulation. The reproducing kernel Hilbert space setting was used in [21] which also discussed the 
probabilistic interpretation behind the estimator /x n . In [11,30] the authors prove the strong convergence 
and optimal rates for the spline model using an approximation - Yli= t ~ U where U is compact, 

positive definite, self-adjoint and with dense inverse. See also [8, 28] that consider ill-posed inverse 
problems without noise using similar methods. In these papers the scaling regime for X n is given in 
terms of the rate of decay of the eigenvalues of the inverse covariance (regularization) operator C~ 1 
(where || • ||i = || C~ l • || l2 ). 

There are many more recent results addressing the asymptotic properties of splines, including [9,17, 
20,24,26,33,47,48,50,51]. Many of these recent results concern the asymptotics of penalized splines 
where one fixes the number of knot points as apposed to the smoothing spline case where the number of 
knots is equal to the number of data points. 

It is known that the special spline problem is equivalent to a white noise problem [7]. Strong con¬ 
vergence and rates for the white noise problem have been well studied see, for example, [2,4,16] and 
references therein. 

An interesting related result, due to Silverman [34], gives the convergence of the smoothing kernel. 
That is, we can write the estimator //' of /x given data {(U, \ in the form 

1 n 

F n i s ) = -X ] K n(s,ti)yi 

i =1 

for a Kernel K n (see Lemma 2.8). Silverman showed that /£,, (•, t) converges to some K uniformly 
on [e, 1 — e] for every e > 0 and each t (the result is valid for the special spline model and penalising 
the second derivative). Whilst this result gives intuition into how the kernel behaves it does not imply 
the convergence of the smoothing spline. Indeed, the convergence is not valid at the end points {0,1} 
and does not account for randomness in the observations y*. In other words K n (-,t) —> K(-,t ) does 
not imply the convergence of /x" (or any characterisation of the limit such as we give in this paper as 
a solution to a variational problem). Silverman's result is, however, valid for a larger range of A than 
we have here. For convergence of the kernel it is enough that i = o(n 2 ~ s ) for any 6 > 0. Our results 
concerning the pointwise convergence of the smoothing spline hold for A satisfying j- = 0{n^). 

One advantage of our approach is that we gain intuition in what happens when X n —> 0 too quickly. 
Our results show a critical rate, with respect to the scaling of A n , at which the methodology is ill-posed 
below this rate and well-posed at or above this rate. The second advantage of our approach is that, by 
using the T-convergence framework, as long as we can show that minimizers are uniformly bounded the 
convergence follows easily (we also need to show the T-limit is unique, but for our problem this is not 
difficult). This is easier than showing, directly, that //" — /F converges to zero. We are consequently 
able to employ simpler assumptions than those required by more direct arguments. 

The outline of this paper is as follows. In the next section we introduce some preliminary material. 
This starts by defining the notation we use in the remainder of the paper. We then remind the reader of 
Gateaux derivatives, the T-convergence framework and the spline methodology respectively. Section 3 
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contains the results for the convergence of the general spline model under appropriate conditions on the 
scaling in the regularization using the T-convergence framework. We discuss the special spline model 
in Section 4. 


2 Preliminary Material 

2.1 Notation 


We use the following standard definitions for rates of convergence. 
Definition 2.1. We define the following. 


(i) For deterministic sequences a n and r n , where r n arc positive and real valued, we write a n = 

0(r n ) if — is bounded. If o as n —>• oo we write a n = o(r n ). 

(ii) For random sequences a n and r n , where r n are positive and real valued, we write a n = O p (r n ) if 
— is bounded in probability: for all e > 0 there exists M e , N e such that 

T n x 


P 


On 

r n 



If “a o in probability: for all e > 0 

T n x 


Vn > N e . 


P 


On 

r n 



->• 0 


as n —>• oo 


we write a n = o p (r n ). 

Definition 2.2. For deterministic positive sequences a n and b n we write a n < b n to mean there exists 
M < oo such that a n < Mb n for all n. 

Throughout this paper we say that a sequence of parameter estimators is consistent if, for any value 
of the “parameters” (splines in our setting), they converge in the sense made precise in Theorem 3.1 to 
the true value. 

We will assume e t and L t arc independent sequences of iid random variables. Our estimators //" are 
also random variables and therefore we can reach only probabilistic conclusions about the convergence 
of n n . 

We will work on a probability space (Q, F. : P) rich enough to support a countably infinite sequence 
of observations (L*, yi)i>\. All stochastic quantifiers are taken with respect to P unless otherwise stated. 
It will be convenient to introduce the natural filtration associated with the marginal sequence (L,;) and 
we define for n € N, Q n = rr(L \.... , L n ), a sequence of sub-cr-algebras of T. We use E[-|(/ n ] to denote 
a version of the associated conditional expectation. 

To emphasize the dependence on the realization u € Q, and hence of the data sequence, of our 
functionals we write . 

For an operator U : T~L —>• T~L we will use Ran((7) to denote the range of U, i.e. 

Ran([7) = € T~L : 3^ € % s.t. Uu = /j,} . 

When U is linear the operator norm is defined by 

\\U\\c(H,H) ■= sup \\Un\\. 

M<i 
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We denote the support of a probability measure ^ on a topological space X endowed with its Borel 
ex-algebra, by supp(</>), i.e. 


supp(</>) = inf < X 7 : X' C X, X 7 is closed, and / 4>(At) 

[ Jx\i' 

A sequence of probability measures P n on a Polish space is said to weakly 
measure P if for all bounded and continuous functions h we have 

P n h -> Ph. 

Where we write Ph = f h (x) P(Ax). If P n weakly converges to P then we write P n =x P. 



converge to a probability 


2.2 The Gateaux Derivative 


Definition 2.3. We say that /:"%—>• R is Gateaux differentiable at // G ‘H in direction v G 'H if the 
limit 


r—>0 


exists. We may dehne second order derivatives by 

d 2 f{n]v,v') = lim 


l2 ,(.^ 0/0* + ri/ ; ,y ) - v) 


r—> 0 r 

for //. i/, i/' G H. Similarly for higher order derivatives. To simplify notation, when it is clear, we write 

d s f{fi]v) := d s f(ix-,v,...,v). 

Theorem 2.4 (Taylor’s Theorem). If / : T-L -» R is m times continuously Gateaux differentiable on a 
convex subset K C then, for n,v P K\ 

fly) = fit*) + 9f{n\ v-fi) + ^< 9 2 /( m ; v - - n) + ■ ■ ■ 


+ 


d m v- n) + Rn 


(m — 1)! 


where 


Rm(v, v - n) = -——— [ (1 - t) m 1 - t)n + tv,v- n) At. 

pn- 1)! J 0 


2.3 T-Convergence 

Variational methods, and in particular T-convergence, have been used by the authors previously to prove 
consistency of estimators which arise as solutions to a variational problem [38,39]. We have the follow¬ 
ing definition of T-convergence with respect to weak convergence. 

Definition 2.5 (T-convergence [6, Definition 1.5]). Let % be a Banach space. A sequence f n : 'H -X 
R U {ztoo} is said to T-converge on the domain 'H to /oo : P —>• R U {±oo} with respect to weak 
convergence on H, and we write /*> = T-lim n f n , if for all v 6 'H we have 

(i) (lim inf inequality) for every sequence ( v n ) weakly converging to v 

foot") < lim inf f n (ix n ); 

n—>• oo 
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(ii) (recovery sequence) there exists a sequence ( v n ) weakly converging to u such that 

fooiy) > lim sup f n (v n ). 

n—>-oo 


When it exists the T-limit is always weakly lower semi-continuous [6, Proposition 1.31] and there¬ 
fore the minimum of the F-limit over weakly compact sets is achieved. An important property of F- 
convergence is that it implies the convergence of almost minimizers where //" is a sequence of almost 
minimi z ers of f n if there exists a sequence S n with S n —>• 0 and f n (f i" ) < inf f n + S n . In particular, we 
will make use of the following well known result which can be found in [6, Theorem 1.21]. 

Theorem 2.6 (Convergence of Minimizers). Let { ±oc } be a sequence of functionals on 

a Banach space (%, || • ||). Assume there exists a weakly compact subset K C H with 

inf f n = inf f n Mn G N. 

If foo = F- lim n f n and is not identically ±oo then 

min/oo = lim inf f n . 

H n-roo H 

Furthermore if fi n G K are almost minimizers of f n then any weak limit point minimizes . 

A simple consequence of the above is the following corollary which avoids recourse to subsequences. 

Corollary 2.7. If in addition to the assumptions of Theorem 2.6 the minimizer of the T-limit is unique 
then any sequence of almost minimizers \j n of f n converges weakly to the minimizer of 


2.4 The Spline Framework 

In this subsection we recap the spline methodology and find an explicit representation for our estimators. 
In particular we construct our estimate as a minimizer of a quadratic functional. We will show the 
existence and uniqueness of the minimizer. 

We consider the separable Hilbert space % with inner product and norm given by (•, •) and || • || 
respectively. We assume we can write U = Hq © T-L\ where (Kq, (•, -)o, II • ||o)> (%i, (*, -)i> II ' 111) are 
Hilbert spaces with dimfF/o) = m and dimf'k/i) = oo. We may write 


Ml = Mlo +MU- 


it is convenient to extend the domain of || • ||j from T~L t to H, setting \\/j,\\i := \\xiH\\ = ||x»A i IU as is 
orthogonal to Tii by assumption. For example, in the special spline case, Tio is the space of polynomials 
of degree at most m — 1 and 'H\ will be the space of remainder terms 


m— 1 


R{t) = n(t) - 

i =0 


V i /i(0) 


The norm on "Hi is ||/r||i = | V'"/i|| ^ 2 . Now the projection of a function // G H to 'H\ is just the 
projection // ^ R given by the above expression. Clearly |//11 ] = ||i?||i = |y;i//|| 1 . Since "Ho is finite 
dimensional we are free to choose the norm without changing the topology, however it is convenient to 
choose a norm that is orthogonal to "Hi when viewed as a function of %. A natural choice is |//11= 
^ 0 1 | V'//(0) | 2 . The special spline problem is discussed more below, particularly in Section 4. 

We wish to estimate /F G 'H given observations of the form ( L r ,;//,) and F, (as well as //, ) is random. 
For convenience we summarize the general spline model in the definition below. One can also see, for 
example, [44] for more details on the general spline model. 
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The General Spline Model. The general spline model is given by (1) where L % G H* are random 
variables and r, are iid random variables from a centered distribution, do, with variance a 2 . The L, are 
assumed to be observed without noise and to be members of a family indexed by I C M 6 *; we write L t 
to mean the operator L which depends upon a parameter t E X. The ‘randomness’ of L is characterized 
by the distribution, <pT, of a random index tGl. For a sample f* ~ qH we write L, as shorthand for L ti . 
The operator L, is therefore interpreted as a realization of L ti . We assume that t r . are independent and 
for convenience we define ^ t to be the distribution do shifted by — L///J. By the Riesz Representation 
Theorem there exists ry G H such that Lqi = (//, , / /) for all // G H. The sequence of observed data 
points (ti, yi), (to, 1 / 2 ), • • • is a realization of a sequence of random elements on (17, T. P). To mitigate 
the notational burden, we suppress the ^'-dependence of t r , yi and L t . 

For example in the case of special splines Lyd = /v (t,) for some t, a random variable distributed in 
[0,1]. Observing Li without noise is equivalent here to observing t, without noise. We refer to Section 4 
for more details. 

We take our sequence of estimators y n of /T as minimizers, which are subsequently shown to be 
unique, of fit' 1 where 

n 

/^ } (f) = -E ^- l ^ 2 + a - Ml?- ( 2 ) 

i =1 

By completing the square we can easily show //' is given implicitly by 

1 n 

Gn,\ n M = — ~y ^ DiVi 

where 

G n a = — 
n 

1=1 

and for clarity we also suppress the tc-dependence of C n \ from the notation. It will be necessary in 
our proofs to bound \\G nt \ n \\u* in terms of X n (for almost every sequence of observations). We do 
this by imposing a bound on ||Li||%* or equivalently on \\rj t \\ for almost every t € X. See Section 4 
for a discussion of the special spline problem and in particular how one can find r^. In order to bound 
the 'Ho norm of //' we need conditions on our observation operators L t . In particular we will use the 
observation operators to define a norm on LLq- Hence our proofs require a uniqueness assumption of L t 
in 'Hq (Assumption 3 below). It is not enough that L t are unique over H as this would not necessarily 
contain any information on the HLo projection of //" , e.g. if Li/i = L t x f° r a ll I 1 ^ H. For clarity and 
future reference we now summarize the assumptions described in the previous paragraphs. 

Assumptions: We make the following assumptions on : HL —> R dehned by (2) and H. 

1. Let (H, (•, •), || • ||) be a separable Hilbert space with HL = H.o®T~i\ where (HLo, (•, -)oi II • ||o) an d 
(%!,(•,•) 1 , || • ||i) are Hilbert spaces. Assume dim(2f) = dim("Hi) = 00 and dim("Ho) =m< 00 . 

2. The distribution of Li := L ti is specihed implicitly by that of ti G 1 C R d and we assume 

L ~ <j) T . 

3. We assume |supply) | > m and that the L t are unique in Ho in the sense that if L t y = L r y for 
all y G Ho then t = r. 

4. There exists a > 0 such that | ry \ = \\L t \\y> : < a for ^-almost every tGl. 


^2 Vi L i + Axi 


(3) 
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For the general spline problem we allow multivariate regression, that is U G M d , see for example [49, 
Section 7] for multivariate P-splines. However, when discussing the special spline problem we will often 
assume d = 1 since, although our convergence results still hold for d > 1, there are regularity issues 
such as that for 2 m < d minimizers are not automatically continuous (for 2m > d the Sobolev space 
H m on 1R' / is embedded in C°, this is not true for 2 m < d). 

The existence of a unique minimizer to (2) is established in the following lemma. 


Lemma 2.8. Define : 7T —> Jfc by (2) and assume A n > 0. Under Assumptions 1-4 the operator 
G n> \ n '. H —y H defined by (3) has a well defined inverse G~\ on span { //],..., ij n } for almost every 
u G H. In particular, there almost surely exists N < oo such that for all n > N there exists a unique 
minimizer //" G 'H to fjt' 1 which is given by 


I 1 


n 


liZy iG n^i- 


(4) 


Proof. We claim that any minimizer of f [ ,f 1 lies in the set "Ho © span{;y ] rp ...., xi?In} =■ W n . If 
this is so, and it can be shown that G~\ is well defined on 'H', then we can conclude the minimizer 

71 , An, ll/ 

must be of the form (4). 

We define Q 1 C tt by 


H' := {w £ 11 : the number of unique tj in is greater than m and ||Lj||%* < a Vz} . 


By Assumptions 3 and 4, P(D') = 1. Let ui £ Q' then there exists N such that for all n > N we have 
that { Li }^ =1 contains m distinct elements. Therefore ||^|||^, := 4 + A n ||/r|| 2 defines a 

norm on H' n for any n > N and, as 'H' n is finite dimensional, we arrive at the same topology whichever 
norm we choose. 

We first show that any minimizer of lies in l~L' n . Let /i = cijfij + i ^jXidj + P where 

4>j are a basis for T~Lq and p © H' n . Then since Lip = (r/i, p) = 0 we have: 


/FV) 


1 x A 

L iXW n P ) 2 + An 


^2 b jxnij 


3 =1 


2 

+ ^n\\p\\ 

1 


2 

1 


where X'H' denotes the projection onto T-L' n . Trivially any minimizer of fn J> must have ||p||i = 0 and 
since p G 'H \ this implies p = 0. Hence minimizers of ij e in 'H' n . 

We now show that G n x„ has a well defined inverse on 'H' n ; that is we want to show that for any 
r G T~L' n there exists //" G 'H' n such that G n \ n p n = r. The weak formulation of G n \ n p" = r is given 
by 

B(p n ,v) = (r,v) VvC'H’n 

where 

1 n 

v) = - X ]( L iP)( L iv) + XnixiP, Xiv)- 

i =1 

Now we apply the Lax-Milgram lemma to imply there exists a unique weak solution. Clearly B : 
T-C n x 'H' n —>• R is a bilinear form. We will show it is also bounded and coercive. As u G fl / , \\L.,\\y* < a 
and for p. u G 'H' n we have 



Hence B is bounded. Similarly, for some constant c independent of //, 


^ + A n \\n\\l 


H’^ c 


i= 1 


where the inequality follows by the equivalence of norms on finite dimensional spaces. Hence B is 
coercive and by the Lax-Milgram Lemma there exists a unique weak solution. We have shown that for 
any r £ ~M! n there exists fi n £ l~C n such that B{\x n , is) = (r, is) for all is £ T~i' n . 

A strong solution follows from the equivalence of the strong and weak topology on finite dimensional 
spaces or alternatively from the following short calculation. We have 

(r, is) = ^ v) Vis £ U' n . 

i - A n xi//\ = 0 Vis £ U' n . 

So choosing is = r - £ YJi=\( L il- in ) r h - Kxi^ n implies \\r - £ YJi=\( L iV n )r]i ~ Kxi^ n \\ 2 = 0 and 

therefore 


Hence 


r = 


- i2(L lf s n )m - Kxi» n = G n .x.jJ n . 

n r - ^ 


i=l 


As this is true for all r £ TL' n we can infer the existence of an inverse operator G n x : 'H' n -£ 'H' n such 


that G x r = . One can verify that G x is linear. As u £ O' was arbitrary, the result holds almost 

surely. □ 


3 Consistency 

We demonstrate consistency by applying the T-convergence framework. This requires us to find the 
T-limit, to show that the T-limit has a unique minimi z er and that the minimi z ers of //T' j are uniformly 
bounded. The next three subsections demonstrate that each of these requirements is satisfied under the 
stated assumptions and allow the application of Corollary 2.7 to conclude the consistency of the spline 
model, as summarized in Theorem 3.1. We start by stating the remainder of the conditions employed. 

Assumptions: 

5. We have X n = n~ p with 0 < p <\. 

6. For is £ T-L the following relation holds: 

[ (L t is) 2 <f>T(dt) = 0 is = 0. 


7. For each n£'H each L f ji is continuous in t, i.e || L s — L t ||%* 


0 as s —^ t. 
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Assumption 5 gives the admissible scaling regime in A n . (dearly if p < 0 then \ n 0 hence we 
expect the limit, if it even exists, to be biased towards solutions more regular than pp. We are required to 
show that the minimizers are bounded in probability. To do so we show they are bounded in expectation. 
We will show in Theorem 3.3 that for p > \ we cannot bound minimizers in expectation; hence it is not 
possible to extend our proofs for p 0 (0, ^]. Theorem 3.1 holds as it does and not in expectation because 
the T-convergence framework requires //" to be a minimi z er and as such we cannot make conclusions 
about the “average minimizer” since E[/.t"j £/,,.] is not a minimi zer. 

We will show that the second derivative of in the direction v is given by j x (L t v) 2 4 >t( dt). 
Assumption 6 is used to establish that is strictly convex, and hence the minimizer is unique. 

It will be necessary to show that 

1 J 1 f 

/ \ L uA <Mdf) (5) 

for all p G % with probability one. We impose Assumption 7 (together with Assumption 4) to imply 
that L t ji is continuous and bounded in t for all p G 'H and therefore by the weak convergence of the 
empirical measure we infer that (5) holds for all p G % and for almost every sequence {L,}^ 1 . In 
particular we can define a set (T C 11 independent of p, on which (5) holds, such that P(D') = 1. 


Theorem 3.1. Define : % —> M by (2). Under Assumptions 1-7 the minimizer //' of converges 
in the following sense: for all e,5> 0 and F G H* there exists N = N(e, 6, F) G N such that 


F(p n )-F(p t) 


> 


< -5 


for n> N. 


Remark 3.2. We view the mode of convergence in the above theorem as a natural generalization of 
convergence in probability; it is weaker than convergence weakly in probability, which would require 
that the convergence of //' —>• /v were uniform over F G 'H* and not pointwise as established in the 
theorem. 

The following theorem shows that if p > \ then without imposing further assumptions it is always 
possible to construct observation functionals { L t }/ G i such that E [||/i n i| 2 ] —»• oo. 

Theorem 3.3. Define : H —>• M by (2), let /t" be the minimizer of and take any a > 0 and 
p > \- Take Assumptions 1-2 and assume that A = n~ p . Then there exists a distribution (pr on X such 
that \\L t \\p* = \\r]t\\ < a for almost every wGSl (i.e. Assumption 4 holds) and E[||// n || 2 ] —>• oo. 

In the special spline model, when A —>• 0 too quickly the functions //" begin to interpolate the data 
points {{ti, yi)}f =1 , hence the derivative of //" will not stay bounded. Furthermore, when considering 
weak convergence, one is restricting to finite dimensional projections. It is therefore not surprising that 
n~ 2 is the best we can do. Forp > 5 and a sequence of real valued iid random variables X t of finite 
variance (which are not identically zero) we have rr r 'E(jj Ya =i ^*) 2 00 • light of this elementary 

observation Theorem 3.3 is not surprising. The proof is given in Section 3.4. 

3.1 The T-Limit 

We claim the F-limit of for almost every w G O, is given by 

P POO 

foo(/J-)= / \y - Uf\ 2 (dy) 0r(df). (6) 

JX J —oo 
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Theorem 3.4. Define foo '■ % IP: by (2) and (6) respectively. Under Assumptions 1-2, 5 and 7, 

/ 00 = r-lim/H 

n 

for almost every oj € ft. 


Proof. We are required to show the two inequalities in Definition 2.5 hold with probability 1. In 
order to do this we consider a subset of ft of full measure, O', and show that both statements hold for 
every data sequence obtained from that set. 

Define g^t^y) = (y - L t y) 2 . For clarity let P(d(t,y)) = 4> T {dt)(t> LtlX \{dy) and P n be the em¬ 
pirical measure associated with the observations, i.e. for any measurable h : X x R —>• R we define 
P n h = d- Ya =i Vi)- Further, let Pn denote the measure arising from the particular realization oj. 
Defining: 

{ i n i n 1 

oj € ft : — ^2 e i( u ) a ‘ 2 an( l — XT e *( w ) ® f > 

i= 1 i=l / 


then P(ft') = 1 by the almost sure weak convergence of the empirical measure [13, Theorem 11.4.1] 
and the strong law of large numbers. Let oj € ft'. 

We start with the lint inf inequality. Pick v G "H and let v n —>■ v. By Theorem 1.1 in [15] we have 


liminf 9 u n {t',y') P(d(t,y)) < liminf 

-oo ,v')->(.t,v) n—>oo j x J-oo 

= liminf f^\u n 


9u n (t, y) pM(d(t,y)) 


Now we show 

liminf g u n(t',y') > g v (t,y) (7) 

n—>■ oo, (t' ,y ') —»■ {t,y) 

which proves the lim inf inequality. Let (t rn , y rn ) —>• (t. y) then 

{gv™(t m ,y m ))2 = I y m - L tm v n I 

> \Lt m v n -y\ - \vm — y\ 

> I y- L t v n I - I L trn u n - L t u n I - I y m - y\ 

> \y — L t u n \ - || L tm - L t \\ H *\\v n \\ - \Vm ~ y\- 


A consequence of the uniform boundedness principle is that any weakly convergent sequence is bounded, 
hence there exists some C > 0 such that |z/' || < C. It follows from the above, and Assumption 7, that 

i i 

liminf (g u ™(t m ,y m )) 2 >\y- L t v\ = {gv{t,y)) 2 ■ 

n—>-oo,m—>-oo 

As our choice of sequence (t m , y m ) was arbitrary we can conclude that (7) holds. 

For the recovery sequence we choose u G % and let;/' = v. We are required to show 

Pg v > lim sup (P^g v + A n ||/t||i) = lim sup P^g v . 

Since we can write 

g v (ti,yi ) = (Li/J) 2 + e 2 + ( LiV ) 2 + 2 eiLi/J - 2 Li/JhiU - 2 eiLjU 

and each term is either a continuous and bounded functional, or its convergence is addressed directly by 
the construction of ft', we have P^g v -A Pg u as required. As oj € ft' was arbitrary, the result holds 
almost surely. □ 
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Remark 3.5. Note that in the above theorem we did not need a lower bound on the decay of X n (only 
that \ n > 0). We only used that X n = o(l). 


3.2 Uniqueness of the T-limit 

To show the T-limit has a unique minimizer we show it is strictly convex. The following lemma gives the 
second Gateaux derivative of After which we conclude in Corollary 3.7 that the T-limit is unique. 


Lemma 3.6. Under Assumptions 1-2 define /oo : 'H 
derivatives of are given by 


X J — oo 


dfoo{n]v) = 2 

9 2 /oo(/u;^,C) = 2 J (L t v)(L t C)<h(At) 


M by (6). Then the first and second Gateaux 

(Ltii - y)L t (v)(j) Lt ^ (dy)(j)T{dt) 


Proof. We first compute the first Gateaux derivative. We have 

(y ~ L t(n + rv)) 2 -(y- L t n) 2 


<9/oo(/t; v) = lim 

r—>0 


= 2 

= 2 


^L tj ut(dt/)0T(df) 


X J — oo 
oo 


{L t n - y)L t (u)(j) Lt ^(dy)(f>T(dt) + lim r 


’ X J —oo 

n OO 


r—>0 


X J — oo 


(L t i/) 2 0 I;tM t(d2/)0r(df) 


(L t /j, - y)L t {v)^ L t (dy)<pr(dt) recalling that L t is linear. 


’ X J —oo 

The second Gateaux derivative follows similarly. 

0 = lta2< ' ( L ^+rO-y)L t v-(L tli -y)L tl , 

r—>0 


= 2 


X J — oo 
oo 


</>w(dy)<Mdi) 


X J — oo 


{L t v){L t Q^ L t(dy)^r(dt) 


= 2 J^L t v)(L t ()Mdt). 


□ 


Corollary 3.7. Under Assumptions 1-2 and 6, define : "H —>• M by (6). Then has a unique 
minimizer which is achieved for // = /d . 

Proof. It is easy to check that <9/oo(/-d; z/) = 0 for all 7/ € 77. By Lemma 3.6 and Assumption 6 
the second Gateaux derivative satisfies <9 2 /oo(ft; v ) > 0 for all ^ / 0. Then by Taylor’s Theorem (and 
noting that is quadratic), for \i / /v, 

fooifi) = /oo(M f ) + ^/oo^; A* - /t f ) > /ooO^) 


as required. 


□ 
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3.3 Bound on Minimizers 

In this subsection we show that |/j n || = O p ( 1). The bound in Ho can be obtained using fewer assump¬ 
tions (than the bound in H), which is natural considering Ho is finite dimensional. We may choose the 
norm on Ho without changing the topology (all norms are equivalent on finite dimensional spaces). We 
will use 

IImIIo = J \L t ii\(j>T{&t). 

Loosely speaking we can then write ||/r n ||o < fit' 1 (/i 71 ). The bound in Ho then follows if min /E is 
bounded. We make this argument rigorous in Lemma 3.8. After this result we concentrate on bounding 
[i n in H. 


Lemma 3.8. Define ft 0 ' 1 : H —>• M by (2). Under Assumptions 1-5 and 7 the minimizers //' of 
are, with probability one, eventually bounded in Ho, he. for almost every u € U there exist constants 
C, N > 0 such that || / u n ||o < C for all n > N. 

Proof. We define P and pE as in the proof of Theorem 3.4, let 

{ . n n N 

u € D : — ^ ef(tu) —»• cr 2 and — E |ep 
i=l 


w)| -> P |ei| 


i —1 


and [i n be a minimizer of Assume cc £ tf. As 

n 

/^V) < /W) < - E+ ^ii? ^ + A i 


2= 1 


there exists N such that f E (/i n ) < a 2 + Ai \\^ ||f + 1 for n > N. 
Note that for any a,kMwe have 


\a — b \ 2 > 


|a — b\ if |a — b\ > 1 
| a — b\ — 1 otherwise. 


In either case \a — b\ 2 >\a — b\ — 1 > |a| — \b\ — 1. Now 


1 


n 


2=1 


>-E;(iL iM i-i W i-i) 

n z ^ 


2=1 

n 


Ei^i—En - 1 


2=1 


2=1 


^ E ^ E i L ^ f i - ^ E m - 1 

2=1 2=1 


2=1 


J \L t fi\())T(dt) - c 

where the convergence follows since |L t ^| is a continuous and bounded functional in t and c is given by 

(l n i n \r 

lirn - 'V' \Li^\ + - E M + 1 - / \ L tl J ^\ ( h(^t) + cr + 1 =: c. 

72—KX) \ 77, Z J 71 z ' / / --7- 


2=1 


2=1 


13 



We now show that J x LiulfPri^) is a norm on T~Lq and hence that the above constant, c, is finite. This 
will also show that |//||o < + c for n > N, which completes the proof. 

The triangle inequality, absolute homogeneity and that J x L / /i| 07 '(dt) > 0 arc trivial to establish. 
By Assumption 3, we have at least m disjoint subsets of positive measure (with respect to (Pr) on X. If 
f x |X t /r|(/>r(dt) = 0 then it follows that on each of these subsets L t fj, = 0. As Tio is m-dimensional this 
determines /x, and hence // = 0 . 

As lo € fT was arbitrary and P(fl') = I, the result holds almost surely. □ 


Remark 3.9. In the above lemma we did not need the lower bound on \ n (only that \ n > 0). The result 
holds for all A„ = 0(1). 

Continuing with the bound in 'H we write 


= 


-t n 1 n 

-E L A G A-ii+-E e - G »L’* = G 


-1 
n, A 7 


u. 




i =1 


i= 1 


n 


(,G 


-l 
n, A r 


Vi 


( 8 ) 


i =1 


where 

1 n 

Un = -Y J ViM- (9) 

n ■' 

i= 1 

We bound \\G~\jJ n ^\\ in Lemma 3.11 and ||L Yl?= l e iG~\ n Vi\\ i n Lemma 3.12. 

In the proof of Lemma 3.11 we show that G~\ : Ran( U n ) —>• Ran((/„). Lemma 3.10 gives 

the conditions necessary to infer the existence of a orthonormal basis of eigenfunctions {v^' 1 )JL\ of 
Ran(f/„ ). Hence we can write 


II G-, X UnV \\ 2 = 

3 = 1 


(n )\2 


(n) 

From here we exploit the fact that ip- ' are eigenfunctions. We leave the details until the proof of 
Lemma 3.11. 

Lemma 3.12 is a consequence of being able to bound \\c(H,H) i n terms of A n . One is then 

left to show (L Y17=i f >) ” = 0(~)- We start by showing that U n is compact, bounded, self-adjoint and 
positive semi-definite. 


Lemma 3.10. Define U n by (9). Under Assumptions 1 and 4, U n is almost surely a bounded, self- 
adjoint, positive semi-definite and compact operator on T~i. 

Proof. In this proof we consider u € O' where O' = {or : |r/, (X)| < a for all i }, noting that 
P(fl') = 1 by Assumption 4. 

Boundedness of U n follows easily as 


\\Unv\\ <-ZVIMI = « 2 IIfII- 

1=1 

Let (•,•)«" be the inner product on M" given by 


( x , 2 /) r « 


1 

n 


n 

^2 XiVi 
2=1 


Vx, y E M n . 
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Now for x € R and is € 'H wc have 


(x, Liv) R i = xLiU = x(rji, is) = (. xrn , z/) 

which shows L* : M —>• T~L is given by L*x = xrji. Now if we define T n = (L \,..., L n ) : T-L —>■ M n 
then for x € R n , z/ € % 

Lji/Xj = 

Hence T* x = ^ ^^=1 x iVi- We have shown U n = T*T n , and is therefore self-adjoint. 

To show U n is positive semi-definite then we need 


{r n v, x) R n = - 

n r—' 





{U n is, is) > 0 


for all € "H. This follows easily as 


{U n is, is) = - Y'(-L*z') 2 > 0. 
n 

i =1 

For compactness of U n (for n fixed) let ;/ r ' be a sequence with \\is m \\ < 1. Since \L,i/ n \ < a for 
every co € H', there exists a convergent subsequence m v such that 

Liis mp —>• Vi = 1,2,..., n say. 

So U n is mp —> ^ Yl 1 i=iVi K i € "H as TOp —>• oo. Therefore each U n is compact. □ 

Using the basis whose existence is implied by the previous lemma, we can bound the first term on 
the RHS of (8). 


Lemma 3.11. Under Assumptions 1-4 define G' n jv„ an d U n by (3) and (9) respectively. Then with 
probability one we have 

W G nM U n\\c(H,H) < 1 

for all n. 


Proof. First note that dim(Ran([/ n )) = dim(span{r/i..... i) n }) < n. Without loss of generality 
we will assume dim (Ran((/„)) = n (else we can assume the dimension is m n where m n < n is an 
increasing sequence). Clearly xi is a self-adjoint, bounded and compact operator on Ran((/„ ) as is U n 
by Lemma 3.10. Therefore there exists a simultaneous diagonalisation of U n and xi on Ran((/„), i.c. 
there exists and such that 



a(n) I (n) , I (n) (n) , (n) 

P) ipj and Xi V’} = Tj Wj 


for all j = 1,2...., n. Since xi is the projection operator then we must have € {0,1}. Furthermore 

i/jj n form an orthonormal basis of Ran([/,, ). Since U n is positive semi-definite it follows that pf ] > 0. 
We have 


G 


n, A„^ n) = U n Vzj n) + A n Xi^ n) = 




So, 



1 

pf ] + \ 
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In particular this shows that 


Assume Ran([/ n ), then 


G-\ n U n : H Ran (U n ). 


M 2 (A*. 4 n) )V’l n) + A and 1 / = 


Wu/.W 


where /t € Ran(f/„)- L . Therefore, 


(PnM,#) = 


(°n l 1 '. A > = )( G »,AAi A) > = „(■»),, Tn) ^n- 

i—1 Pj + WiTj 


Which implies 


Hence 


- rnup'^ - w^¥ Mr) - 


\\GpxUM\ 2 ='E ( - a ^ U ^4 


Sl/?r+^fv ^ 

n 

EM ”') 2 


This proves the lemma. 

We now focus on bounding ||G“^ v n \\ where u n = P e i r h- 


Lemma 3.12. Under Assumptions 1-5 dehne G n \ n by (3). Then 


E - ^ ejG n \ n rjj Q n =0(1) almost surely. 


Proof. Recalling B from the proof of Lemma 2.8, we have 

{G n ,\nP->P) = > A n ||/x||i. 

This implies ||G nj ,\ n /it|| > A,,11//|1 1 . By Lemma 2.8 there exists a well defined inverse of G n x n at //,, 
hence we let p = G~\ i}, and we have 


\\G^ Xn rhh < T \\rh\\ < 


16 



almost surely. Now, define u n = ^ Y17= 1 e ?W an( l 


E 




= ^VllG 


-l 


n 


< 


2 Z_^ 

i=l 


o? a 2 
n\l ' 


,A n ^ 


2 

1 


Combined with Lemma 3.8 (the Tio bound) this proves the lemma. □ 

Recalling (8) and via Lemmas 3.11 and 3.12 we obtain the following asymptotic bound on minimiz- 
ers in H. 


Theorem 3.13. Under Assumptions 1-5 we have 

E [||// l || 2 |t/ n ] =0(1) almost surely. 


( 10 ) 


This is a stronger result than we needed; we were only required to show that | //' | is bounded in 
probability. Taking expectation of (10) one has 

Ell^ll 2 = 0(1). 

Hence applying Chebyshev’s inequality we may conclude that |//'| = O p (l). 


Corollary 3.14. Under Assumptions 1-5 we have || / u n || = O p (l). 

We conclude this section with a brief analysis of the rate of convergence. For any F £ H*, by the 
Riesz Representation Theorem, there exists £ £ T-L such that F(/i) = (//, £) for all n £ H. Hence 

nn - *V) = ^ x u n - id)^+ 


where u n = ^ Y17= t • Decomposing 'H into 'H = Ran(f/ n ) © Ran(Ly,) J one can write 

*V“) - F©) = ((G-\„C/„ - A«) - (xR.n (!/„)©,f) + (G-{ X n ,() 


~Ar, 


Uti n) + A„ 


t / ( n ) 


(n) 


’?) - (xRanft/nl-L^^) + ( G n \ n ^ n >^) i 11 ) 


where X Ran (^ y is the projection onto Ran(t/ n ). If we assume 


n 

lim V — TT 

n—¥oo ' /9V n ) 

7 = 1 Pi 


< oo. 


( 12 ) 


Then 


£ 


-A, 


4 —' «( n ) I \ V J / \ J / 4 —' a\ 

3 = 1 U? + A " 3 I \i 

And therefore the first term in (11) is of the order n~ p . By the proof of Lemma 3.12 the third term 

in (11) is of order A . The second term is independent of A n . The optimal rate of convergence is 

therefore found by balancing the first and third terms. This will imply an optimal choice of p = -j. We 

summarise in the following proposition. 


n 1 

PM n) ) KU)<iiaikiia„x; 


(n) ' 
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Proposition 3.15. Under Assumptions 1-6, for F £ 'H* take £ £ % such that F(fi) = (//. £) and 
assume (12) holds and that there exists q > 0 such that 

ll^ll - IIXRan(t/ n )M t || £ ri~ q 

where U n is dehned by (9) and ) are an eigenvalue-eigenfunction pair for U n . Then 

E | F(p n ) — F(/J")| | Q n = O (n _p ) + O ( n~ q ) + O f —^—j= ) almost surely. (13) 

In particular the optimal choice is p = j in which case the rate of convergence is 

E [|FGO - F(fJ )I I g n ] = O (n max {-i-«}) . 

Proof. The argument preceding the theorem provides the proof for the first term in (13) and the third 
term is a consequence of Lemma 3.12. The second term follows easily from 

(xRan < IICII IIXRan(C7„)-L/W t || < ||CII (ll^ll “ \\XRm(U n )^ ll) • 

The optimal rate is a consequence of choosing p that minimizes n~ p + n p ~ 0 ' 5 . □ 

The conditions of the above theorem arc difficult to theoretically verify. Even for the special spline 
problem the authors know of no method to check whether assumption (12) holds and whether such a q 
exist. We leave further investigation into the rate of convergence for future works. 


3.4 Sharpness of the Scaling Regime - Proof of Theorem 3.3 

Proof.[Proof of Theorem 3.3] Fix any a > 0 and without loss of generality we can choose {rjtjtei such 
that ||? 7 t|| = a for all t El. Define L t € % by L t = (rji, •). 

In the proof of Lemma 2.8 we showed 

\(G n , Xn p,v)\ < (a 2 + X n )\\ii\\\\u\\. 

Letting v = G 1u \ r , IE for p £ span {//],one has 

\\G n ,x n ^\\ 2 < (a 2 + A n )||/r||||G n) A„A t ll- 
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as \ 2 n n -> 0. Hence by taking expectations: 


-i— 1 , ,n ||2 


OC. 


By noting 

we conclude the proof. 


E [ii<G,© 

E[||^f] =E[||G^ n l7 nM t||2| + E I 


n ||2 


□ 


4 Application to the Special Spline Model 

Consider the application to the special spline case, T,// = //(C). We let 

"H = H m := {p : [0,1] —>■ R s.t V'p abs. cts. for i = 1, 2,..., m — 1 and V m p £ L 2 } . 

For m > 1, "H is a reproducing kernel Hilbert space and therefore L, as defined are linear and bounded 
operators on H. See [5,44] for more details on reproducing kernel Hilbert spaces. The special spline 
solution is the minimizer of 

1 n 

Ufa ) = -- m ? + a jv”vh 2 l2 

n z —' 

i =1 

over all // G H m . It can be shown that the minimizer of f n is a piecewise polynomial of degree 
2m — 1 in each interval (t t . t, + i) for / = ()...., n (where we dehne to = 0 and £ n+ i = 1), for example 
see [44, Section 1.3]. 

This section discusses the following points. 

1. The decomposition T~L = 'Ho ®T~L\ where 'Ho is finite dimensional. 

2. The function % corresponding to ( rjt , / t) = T/// = //(f). 

The other assumptions needed to apply Theorem 3.1 are Assumption 3 and Assumption 6. Assumption 3 
is 

//(f) = //(/•) for all polynomials // of degree at most m — 1 then f = r 
which clearly holds. Assumption 6 becomes 

f |z/(t)| 2 </>'r(dt) = 0 zz = 0 
Jo 

which, for example, is true if 0r(dt) = d t and j>T(t) > 0 for all t £ [0,1]. 


1. The decomposition 'H = 'Ho © "Hi. For // G 'H by Taylor expanding // from 0 we can write: 

m—1 


M*) = E 


+ R(t) 


i=0 


where V*i?(0) = 0 for alii = 0,1,..., m — 1. Hence R G Hi where 

Hi = {g G H m : V*p(0) = 0 for all i = 0,1,... , m - l} . 
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A Poincare inequality holds on this space so ||/i||'f = fj |V m /r(f)| 2 d t is a norm on "Hi. 
We define 'Ho to he the span of the functions Q defined by 


m = - 


for i = 0,1,..., m — 1. 


The space is equipped with the inner product 

m— 1 

M o = £ vv(o)vv(o). 


i=0 


The space Ro has dim('Ho) = m - 


2. The functions r//. In the above R is given by 

(t - u) m - 1 


m = I 

Jo 

where (u)+ = max{0,u} and 


. .■'V m n(u)du= / G(t, u)V m /u(u) du 

Jo 


G(t , u) = 


(f-n) 


m— 1 


(m — 1)! 

is the Green’s function for V”'// = u and boundary conditions V'/ifOj = 0 for all 0 < j < m — 1. 
We claim that r/j G iT m satisfying (rjt, f-i) = n(t) are given by 

TO— 1 

%(r) = V CiWCiM + / G(t, u)G(r, u) du =: rf t (r) + 7^(r). 

i=o Jo 

Furthermore rfi G %o and r/} G ‘H \ for all t G [0,1]. The proof follows directly from calculating 

TO—1 

(rit,li) = V VSft(0)VV(0) + / V m ? ?t (u)V"V(u) du 

do 


i=0 


and noticing 


771—1 


V*7/t(r) = ^ &(*) [V*Cj(r)] r=0 = Ci(t) for * < m 

i =t 

V m ??t(r) = V™ [ G(t , u)G(r, u ) du = G(t, r). 

Jo 

One can easily show that ||r/t|| < 1 for all t G [0,1]- 

Continuity of r/j follows easily. As each polynomial is Lipschitz continuous on the interval [0,1], 
there exists a constant C, (depending on the order of the polynomial i) such that |£j(i)—C*( s )l < Ci\t—s\. 
Now for the integral term let m > 2 and s > t then: 


( G(s, u) — G(t, u)) G(r, u) du 


w (s - up- 1 „ (t - up- 1 . _ . , 

h>u— ( -7TT--TTT- G(r,u) d u 

10 \ {m- 1 )! 


< 


(s-u) 


771—1 


It 


-G(r, u ) du 


+ 


1 


< 


(m — 2)! J o 
m\s — t| 


|s — t| g(r, u) du 


[(m- l)!] 2 ' 

The case m = 1 is similar. It follows that ||L S — L t \\u* = \\r) s ~ Vt II < C|s — t\ for some C < oo and 
hence L t is continuous. 
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